Welcome to my Data Science project on A/B TESTING ANALYSIS for a website survey.
The project utilizes a dataset with features for more than 8,000 users of a website, as well as their click-through response to two different versions of a survey. The results of the test are analyzed and evaluated through data exploration, sanity checks and statistical tests. Recommendations are provided on whether is safe and worthy to launch the experimental version "B" of the survey. In addition, a new A/B testing is proposed, and its required sample size is estimated based on the desired statistical power and minimum detectable effect.
I also invite you to visit My LinkedIn profile and see my other projects in My GitHub profile.
Sincerely,
Michail Mavrogiannis
Import libraries below:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from scipy import stats
import statsmodels as sm
import statsmodels.api as sma
Import dataset into dataframe "dfr".
dfr = pd.read_csv('C:/Users/Michael/Desktop/Data Science- MM/ABTest/Data.csv')
dfr.head()
The experiment is about testing the Click-Through Probability for two versions of a website survey. The survey includes a single yes/no type question, to which the users can respond through "radio" buttons. No information is provided about the format of the survey (popup, widget, or other), or about the content of the question. Thus, this project will focus on the Click-Through probability for each version of the survey, i.e. whether the users clicked at all or not, on the survey. It will not focus on the probability of each of the possible responses, (yes or no).
Description of Features:
auction_id: ID of the survey impression, unique per user.
experiment: Whether the user belongs to 'control' or 'exposed' (experiment) group.
date: Date in YYYY-MM-DD format.
hour: Hour in HH format.
device_make: Make and model of the user device.
platform_os: Operating System of the user device platform, represented by a code.
browser: Browser on which the user sees the website and the survey.
yes: The survey includes a question to which the user can respond through "radio buttons", 1: the user responded "yes", 0: the user either responded "no" or did not click on the survey at all*.
no: The survey includes a question to which the user can respond through "radio buttons", 1: the user responded "no", 0: the user either responded "yes" or did not click on the survey at all*.
*If both columns 'yes' and 'no' are zero, it means that the user did not click on the survey at all.
dfr.info()
The dataset used as an input in the present project was obtained from the following post on website kaggle.com; https://www.kaggle.com/osuolaleemmanuel/ad-ab-testing?select=AdSmartABdata+-+AdSmartABdata.csv. I would like to thank the author of the post, Osuolale Emmanuel, for granting me permission to use the dataset.
Column 'auction_id' is renamed to the more intuitive 'user_id' and 'platform_os' is renamed to 'operating_sys':
dfr.rename(columns = {'auction_id': 'user_id', 'platform_os': 'operating_sys'}, inplace = True)
'Experiment' column currently contains string values "exposed" or "control". These are changed to 1: for experiment group, and 0: for control group.
dfr['experiment'].unique()
dfr['experiment'] = dfr['experiment'].map({'exposed': 1, 'control': 0})
Column 'date' currently contains strings which are here converted into datetime objects. Also new columns are added for the day of the week, and concatenated date and day of week:
dfr['date'] = dfr['date'].apply(lambda x: pd.to_datetime(x))
dct1 = {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 4: 'Friday', 5: 'Saturday', 6: 'Sunday'}
dfr['day'] = dfr['date'].apply(lambda x: x.dayofweek).map(dct1)
dfr['temp'] = dfr['date'].apply(lambda x: x.date())
dfr['date, day'] = dfr.apply(lambda x: str(x['temp']) + ', ' + str(x['day']), axis = 1)
dfr.drop('temp', axis = 1, inplace = True)
A new column is created to show whether the user clicked or not on the survey, at all; 0: did not click, 1: clicked.
dfr['clicked'] = dfr.apply(lambda x: 1 if x['yes'] == 1 or x['no'] == 1 else 0, axis = 1)
Information on the content of the survey question is not available, therefore responses "yes" or "no" cannot be interpreted in a way insightful for the A/B testing experiment. Columns 'yes' and 'no' are then removed:
dfr.drop(['yes', 'no'], axis = 1, inplace = True)
dfr = dfr[['user_id', 'experiment', 'date', 'day', 'date, day', 'hour', 'operating_sys', 'browser', 'device_make',
'clicked']]
Updated dataframe per the preprocessing above:
dfr.head()
Description of Updated Features:
user_id: ID of the survey impression, unique per user.
experiment: Whether the user belongs to control group: 0, or experiment group: 1.
date: Date in YYYY-MM-DD format.
day: Day of the week.
date,day: Concatenated dated and day of the week.
hour: Ηour in HH format.
operating_sys: Operating System of the user device platform, represented by a code.
browser: Browser on which the user sees the website and the survey.
device_make: Make and model of the user device.
clicked: Whether the user clicked: 1, or did not click: 0, on the survey.
dfr.info()
No null values are seen so far. Further checks will be performed in the next sections.
dfr.describe()
After the completion of the A/B testing, the first step is to perform sanity checks on the results. Specifically in this section;
len(dfr), dfr['user_id'].nunique()
test_summary = pd.DataFrame()
test_summary['Total # of Users'] = dfr.groupby('experiment').count()['user_id']
test_summary['Users Clicked'] = dfr[dfr['clicked'] == 1].groupby('experiment').count()[['user_id']]
test_summary['Click-Thru Prob.'] = (test_summary['Users Clicked'] / test_summary['Total # of Users']).round(3)
test_summary
sns.countplot(data = dfr, x = 'experiment')
plt.title('Number of users per group'); plt.ylabel('# of users'); plt.show()
The total number of users in the control and the experiment groups are comparable, but not equal. Whether the difference is significant or could have happened by luck, will be checked in two ways (for the sake of reference); using a Confidence Interval and a Hypothesis Test.
ci = sma.stats.proportion_confint(nobs = len(dfr), count = len(dfr[dfr['experiment'] == 0]), alpha= 0.05, method = 'normal')
print('The Confidence Interval is {}.'.format([ci[0].round(3), ci[1].round(3)]))
The above confidence interval includes 0.5, thus the difference is not statistically significant.
ht = sma.stats.proportions_ztest(nobs = len(dfr), count = len(dfr[dfr['experiment'] == 0]), value = 0.5,
alternative='two-sided', prop_var=False)
print('The Z-statistic is {} and its p-value is {}.'.format(ht[0].round(3), ht[1].round(3)))
The p-value is greater than the significance level, therefore the null hypothesis (i.e. that population's probability of assignment to control group is 0.5) cannot be rejected.
sns.barplot(data = dfr, x = 'experiment', y = 'clicked', estimator = np.mean, ci = None, palette = 'tab20c_r')
plt.title('Click-Through Probability per group'); plt.ylabel('click-through probability'); plt.show()
The observed difference between the click-through probability of the control and experiment group will not be checked for significance yet. This is contingent upon the outcome of the sanity checks of the following sections.
As mentioned previously, part of the sanity checks is to verify that comparable numbers of users from control and experiment groups are assigned to each user "slice". User slices are defined by the various different values of the dataset features. The following sections explore features one-by-one, however it would be good to get a first idea of any features which stand out, when determining whether a user belongs to control or experiment group. This will provide an indication of unequal user distribution within these features.
A decision tree is used below, with a "min_impurity_decrease" limit determined after trial-and-error. The training set of the model includes the predictors: hour, date (dummies), operating_sys, browser (dummies). The response variable is whether a user belongs to control or experiment group, i.e. column 'experiment'. No train/test set split is required here.
from sklearn.tree import DecisionTreeClassifier, plot_tree
mdl = DecisionTreeClassifier(min_impurity_decrease = 0.01)
dfr_tree = pd.concat([dfr[['hour', 'operating_sys']], pd.get_dummies(dfr['date']), pd.get_dummies(dfr['browser'])], axis = 1)
mdl.fit(dfr_tree, dfr['experiment'])
plt.figure(figsize=(15,12)); plot_tree(mdl, max_depth = 20, fontsize = 18); plt.show()
dfr_tree.columns[[2, 13, 0]]
The features (or values of categorical features) that stand out are:
This information will be taken into account during the data exploration of the next sections.
plt.figure(figsize = (7,4))
sns.countplot(x = dfr['date, day'].sort_values(), color = 'lightblue')
plt.title('Total number of users per date'); plt.xlabel('date'); plt.xticks(rotation = 70); plt.ylabel('# of users')
plt.show()
As seen in the histogram for total number of users per date:
plt.figure(figsize = (7,4))
sns.countplot(x = dfr['date, day'].sort_values(), hue = dfr['experiment'])
plt.title('Number of users per date & group'); plt.xlabel('date'); plt.xticks(rotation = 70); plt.ylabel('# of users')
plt.show()
As seen in the histogram for total number of users per date and group:
plt.figure(figsize = (7,4))
sns.barplot(x = dfr['date, day'].sort_values(), y = dfr['clicked'], estimator = np.mean,
hue = dfr['experiment'], ci = None, palette = 'tab20c_r')
plt.title('Click-Through Probability per date & group'); plt.xlabel('date'); plt.xticks(rotation = 70)
plt.ylabel('click-through probability'); plt.legend(loc = [1.01, 0.78], title = 'experiment'); plt.show()
For the 7 out of the 8 days of the test, the experiment group has a higher click-through probability than the control group. A sign test is performed below, to check if this behavior could have happened by luck or if it shows a significant trend. We assume that the days are a sequence of Bernoulli trials with binary outcome whether the control or experiment group has a higher click-through probability. Assuming that the success probability of either outcome is 50%, and considering a significance level of 5% for a two-tailed test, the p-value of the observed behavior is found below:
print('The p-value for the sign test is {}.'.format(round(2 * (1 - stats.binom.cdf(k = 7-1, n = 8, p = 0.5)), 2)))
plt.figure(figsize = (24,11))
i = 1
for j in dfr['date, day'].sort_values().unique():
plt.subplot(2, 4, i)
plt.hist(dfr[dfr['date, day'] == j]['hour'], bins = np.arange(25)-0.5, color = 'Lightblue')
i = i + 1
plt.title('{}'.format(j), fontsize = 14); plt.xticks(range(0, 24)); plt.xlabel('hour', fontsize = 14)
plt.ylabel('Total # of Users', fontsize = 14); plt.ylim(0, 150)
As seen in the histogram of total number of users per date and hour:
plt.figure(figsize = (24,11))
i = 1
for j in dfr['date, day'].sort_values().unique():
plt.subplot(2, 4, i)
plt.hist([dfr[(dfr['date, day'] == j) & (dfr['experiment'] == 0)]['hour'],
dfr[(dfr['date, day'] == j) & (dfr['experiment'] == 1)]['hour']],
bins = np.arange(25)-0.5, label = [0, 1], color = ['steelblue', 'darkorange'], stacked = False)
i = i + 1
plt.title('{}'.format(j), fontsize = 14); plt.xticks(range(0, 24)); plt.xlabel('hour', fontsize = 14)
plt.ylabel('# of Users', fontsize = 14); plt.ylim(0, 150); plt.legend(title = 'experiment', fontsize = 14)
Plot the first subplot enlarged below, i.e. number of users per group for Friday 2020-07-03:
plt.figure(figsize = (5,5))
sns.countplot(x = dfr[dfr['date'] == '2020-07-03']['hour'], hue = dfr[dfr['date'] == '2020-07-03']['experiment'],
order = range(24))
plt.title('2020-07-03, Friday'); plt.xticks(range(0, 24)); plt.ylabel('# of Users')
plt.legend(loc = 'upper left', title = 'experiment'); plt.show()
As seen on the histograms of number of users per group and date/hour:
The previously used decision tree was verified regarding the fact that feature 'hour' has a highly uneven distribution of users between control and experiment group!
There must have been a bug in the selection algorithm causing a tremendous spike for control users, on Friday 07/03 at 15:00. On that day, control users are selected only during the 15:00th hour of the day, and they are multiple times more than all the experiment users. In the rest hours of the day, only experiment users are selected. This uneven distribution within the day and per hour can cause a Simpson's paradox behavior, when analyzing the overall results of the experiment.
Except Friday 07/03, on all other days experiment users are systematically and considerably more than the control users per hour. Dividing all users exactly between control and experiment group may be challenging, given that this even distribution of users is needed per every user "slice" (day, hour, browser, etc.). For this, small divergences are expected, however here there is a clear trend, haven't happened by luck.
plt.figure(figsize = (22,10))
i = 1
for j in dfr['date, day'].sort_values().unique():
plt.subplot(2, 4, i)
sns.barplot(x = dfr[dfr['date, day'] == j]['hour'], y = dfr[dfr['date, day'] == j]['clicked'],
hue = dfr[dfr['date, day'] == j]['experiment'],
estimator = np.mean, ci = None, palette = 'tab20c_r', order = range(24))
i = i + 1
plt.title('{}'.format(j), fontsize = 14); plt.legend(title = 'experiment', fontsize = 14, loc = 'upper left');
plt.xticks(range(0, 24)); plt.xlabel('hour', fontsize = 14)
plt.ylabel('Click-Through Probability', fontsize = 14); plt.ylim(0, 1.1)
As seen from the histograms of click-through probability per group and date/hour:
dfr['operating_sys'].value_counts()
The platform operating system column includes codified values 5, 6, 7, the governing of which are 5, 6. In the following we need to:
For the above, information from column 'device_make' will also be utilized.
dfr['device_make'].value_counts()
The top 3 frequent (specific) cellphones of the dataset are iPhone, Samsung SM-G960F, and Samsung SM-G973F. It is known that iPhone has operating system IOS and the two latter phones have Android. Below is checked what codified operating systems these phones have in the dataset:
top_3 = dfr[dfr['device_make'].apply(lambda x: x in ['iPhone', 'Samsung SM-G960F', 'Samsung SM-G950F'])]
top_3.groupby(['device_make', 'operating_sys']).count()[['user_id']]
Apparently, operating system '5' corresponds to IOS and '6' corresponds to Android. There are five iPhones misclassified as having Android, which is corrected below:
for i in range(0, len(dfr)):
if dfr['device_make'].loc[i] == 'iPhone':
dfr['operating_sys'].loc[i] = 5
Check what operating system '7' is:
dfr[dfr['operating_sys'] == 7]
Lumia 950 has operating system Microsoft Windows, which corresponds to code '7'. At this point we conclude that all devices of the dataset utilize either Android, or IOS, or Microsoft Windows, thus they are mobile devices.
The operating system codes are renamed below based on the information found:
dfr['operating_sys'] = dfr['operating_sys'].map({5: 'IOS', 6: 'Android', 7: 'Windows'})
sns.countplot(data = dfr, x = 'operating_sys', color = 'lightblue')
plt.title('Total number of users per operating_sys'); plt.ylabel('# of users'); plt.show()
As seen in the histogram above:
sns.countplot(data = dfr, x = 'operating_sys', hue = 'experiment')
plt.title('Number of users per operating_sys & group'); plt.legend(loc = 'upper right', title = 'experiment')
plt.ylabel('# of users'); plt.show()
For Android, check if the difference of users between control and experiment groups could have happened by luck, or if it is statistically significant:
ci = sma.stats.proportion_confint(nobs = len(dfr[dfr['operating_sys'] == 'Android']),
count = len(dfr[(dfr['operating_sys'] == 'Android') & (dfr['experiment'] == 0)]),
alpha= 0.05, method = 'normal')
print('The Confidence Interval is {}.'.format([ci[0].round(3), ci[1].round(3)]))
For IOS, check if the difference of users between control and experiment groups could have happened by luck, or if it is statistically significant:
ci = sma.stats.proportion_confint(nobs = len(dfr[dfr['operating_sys'] == 'IOS']),
count = len(dfr[(dfr['operating_sys'] == 'IOS') & (dfr['experiment'] == 0)]),
alpha= 0.05, method = 'normal')
print('The Confidence Interval is {}.'.format([ci[0].round(3), ci[1].round(3)]))
As seen above:
sns.barplot(data = dfr, x = 'operating_sys', y = 'clicked', hue = 'experiment', ci = None, palette = 'tab20c_r')
plt.title('Click-Through Probability per operating_sys & group'); plt.ylabel('click-through probability')
plt.legend(loc = 'upper right', title = 'experiment'); plt.show()
For Android, check if the click-through probability difference between control and experiment group could have happened by luck, or if it is statistically significant:
ci = sm.stats.proportion.confint_proportions_2indep(
nobs1 = len(dfr[(dfr['operating_sys'] == 'Android') & (dfr['experiment'] == 0)]),
nobs2 = len(dfr[(dfr['operating_sys'] == 'Android') & (dfr['experiment'] == 1)]),
count1 = len(dfr[(dfr['operating_sys'] == 'Android') & (dfr['experiment'] == 0) & (dfr['clicked'] == 1)]),
count2 = len(dfr[(dfr['operating_sys'] == 'Android') & (dfr['experiment'] == 1) & (dfr['clicked'] == 1)]),
method = 'wald', compare='diff', alpha= 0.05, correction=False)
print('The Confidence Interval is {}.'.format([ci[0].round(3), ci[1].round(3)]))
As seen above:
dfr['browser'].sort_values().unique()
As seen in section 'operating_sys', all devices are mobile. Therefore, the above different browser names can be simplified. For example, there is no need having separate browsers named 'Chrome' and 'Chrome Mobile'; these can be merged into one category, 'Chrome'. This process is performed below for all applicable browser names:
dct_browser = {'Android': 'Android', 'Chrome': 'Chrome', 'Chrome Mobile': 'Chrome', 'Chrome Mobile WebView': 'Chrome',
'Chrome Mobile iOS': 'Chrome', 'Edge Mobile': 'Edge', 'Facebook': 'Facebook', 'Firefox Mobile': 'FireFox',
'Mobile Safari': 'Safari', 'Mobile Safari UI/WKWebView': 'Safari', 'Opera Mini': 'Opera',
'Opera Mobile': 'Opera', 'Pinterest': 'Pinterest', 'Puffin': 'Puffin', 'Samsung Internet': 'Samsung'}
dfr['browser'] = dfr['browser'].map(dct_browser)
sns.countplot(dfr['browser'], color = 'lightblue')
plt.title('Total number of users per browser'); plt.ylabel('# of users'); plt.xticks(rotation = 60); plt.show()
As seen above;
sns.countplot(data = dfr, x = 'browser', hue = 'experiment')
plt.title('Number of users per browser & group'); plt.xticks(rotation = 60); plt.ylabel('# of users')
plt.legend(loc = 'upper right', title = 'experiment'); plt.show()
For Chrome, check if the difference of users between control and experiment groups could have happened by luck, or if it is statistically significant:
ci = sma.stats.proportion_confint(nobs = len(dfr[dfr['browser'] == 'Chrome']),
count = len(dfr[(dfr['browser'] == 'Chrome') & (dfr['experiment'] == 0)]),
alpha= 0.05, method = 'normal')
print('The Confidence Interval is {}.'.format([ci[0].round(3), ci[1].round(3)]))
As seen above,
The previously used decision tree was verified regarding the fact that feature 'browser' has a highly uneven distribution of users between control and experiment group! (The decision tree specifically referred to browser type 'Chrome Mobile Webview', which was merged here with the rest Chrome-named browsers.)
Chrome is the browser for the majority of users. In addition to this, the number of Chrome users assigned to control group is smaller than the experiment group, and this difference is statistically significant. This means that there might be a bug in the selection algorithm, and that we cannot make a conclusion about the overall difference of click-through probability between the two groups, due to Simpson's paradox.
Given the high number of different user "slices", it may be challenging for the selection system to assign same number of users per slice, to each group. However, for the case of browsers, this should be done at least for the most frequent browsers of the dataset; Chrome, Facebook, Safari, and Samsung.
sns.barplot(data = dfr, x = 'browser', y = 'clicked', estimator = np.mean, hue = 'experiment',
ci = None, palette = 'tab20c_r')
plt.xticks(rotation = 60); plt.ylabel('click-through probability'); plt.legend(title = 'experiment', loc = 'upper right')
plt.title('Click-Through Probability per browser & group'); plt.show()
For Chrome (which applies for the majority of users), check if the click-through probability difference between control and experiment groups could have happened by luck, or if it is statistically significant:
ci = sm.stats.proportion.confint_proportions_2indep(
nobs1 = len(dfr[(dfr['browser'] == 'Chrome') & (dfr['experiment'] == 0)]),
nobs2 = len(dfr[(dfr['browser'] == 'Chrome') & (dfr['experiment'] == 1)]),
count1 = len(dfr[(dfr['browser'] == 'Chrome') & (dfr['experiment'] == 0) & (dfr['clicked'] == 1)]),
count2 = len(dfr[(dfr['browser'] == 'Chrome') & (dfr['experiment'] == 1) & (dfr['clicked'] == 1)]),
method = 'wald', compare='diff', alpha= 0.05, correction=False)
print('The Confidence Interval is {}.'.format([ci[0].round(3), ci[1].round(3)]))
As seen above:
dfr['device_make'].nunique()
print(dfr['device_make'].value_counts()[0:15],
'\n\nThe above devices constitute {}% of all user devices in the dataset.'.\
format(100 * round(dfr['device_make'].value_counts()[0:15].sum() / len(dfr), 3)))
As expected, there are many different types of devices. The most frequent 15 devices are shown above, and they cumulatively constitute almost 80% of all devices of the dataset. Regarding the information about the device/make:
It is not necessarily needed for the sanity checks for the A/B testing. The algorithm assigning users to control and experiment groups is not needed/ expected to assign same number of users per group to each different type of device; this would be an exaggeration. Assigning same number of users per group to the rest of the user "slices" such as day, hour, browser, etc. would be sufficient.
For the sake of data exploration and for getting an idea of the mobile companies' shares in the market, we could categorize users into fewer categories based solely on device make, not model. However, there are many missing data points, i.e. the 'Generic Smartphone' ones, which constitute more than 50% of the dataset.
Through online "data scraping", information about device key features could be added to the dataset, such as screen size, aspect ratio, analysis. This would help understanding how these features affect click-through probability, using a Machine Learning model. However, the missing data points for device are too many in this dataset.
A summary of the observations made in the previous sections is shown below:
We conclude that it would NOT be safe to proceed with launching the experimental version "B" of the website survey.
More specifically, the per-feature analyses of the data shown the following:
Recommendations to fix the problem:
This section shows the process of estimating the required sample size for a new testing.
The first step is to calculate the baseline click-through probability, i.e. the mean click-through probability of the website survey observed so far, for the "control" version of the survey. Just an estimate of this can be obtained through the available data. Keeping in mind the issues referred to in the previous section as a caveat, and taking out the first day of the experiment (which has unusual spikes in control user numbers), the baseline click-through probability is estimated as follows:
print('The baseline click-through probability of the control group is {}.'.\
format(round(dfr[(dfr['date'] != '2020-07-03') & (dfr['experiment'] == 0)].mean()['clicked'], 3)))
However this is the click-through probability of the control group from the testing sample, not of the population of all users of the website. Below is found the 95% Confidence Interval of the probability of the user population:
ci = sma.stats.proportion_confint(nobs = len(dfr[(dfr['date'] != '2020-07-03') & (dfr['experiment'] == 0)]),
count = len(dfr[(dfr['date'] != '2020-07-03') & (dfr['experiment'] == 0) & (dfr['clicked'] == 1)]),
alpha = 0.05,
method = 'normal')
print('The Confidence Interval is {}.'.format([ci[0].round(3), ci[1].round(3)]))
It is known that the Standard Error (variance) increases with the increase of the click-through probability, when the latter is <= 0.50. To be conservative, the highest possible baseline probability will be selected to size the experiment.
The selected statistical sensitivity (sometimes referred to as "power") of the experiment is 80%, and a business significance level (minimum detectable effect) of 0.02 is assumed:
target_power = 0.8
mid_det_effect = 0.02
significance = 0.05
baseline_ctr = ci[1]
# Initial parameters for the loop:
power = 0; users_per_group = 1
while power < (target_power - 0.01/100) or power > (target_power + 0.01/100): # Define convergence tolerance: +/- 0.01%
power = sm.stats.power.normal_power_het(diff = mid_det_effect,
nobs = users_per_group,
alpha = significance,
std_null = np.sqrt(2 * baseline_ctr * (1 - baseline_ctr)),
std_alternative = np.sqrt(baseline_ctr * (1 - baseline_ctr) +
(baseline_ctr + mid_det_effect) * (1 - (baseline_ctr + mid_det_effect))),
alternative='two-sided')
users_per_group = users_per_group + 1
print('{} users are needed for EACH group, to achieve a sensitivity of {}.'.format(users_per_group, round(power, 2)))
---------------------------------------------------------------------------- / END OF NOTEBOOK, THANK YOU! / ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- © 2021 Michail Mavrogiannis
You are welcome to visit My LinkedIn profile and see my other projects in My GitHub profile!
Michail Mavrogiannis